Statistical Properties of Open Reading Frames in Complete Genome Sequences
نویسنده
چکیده
Some statistical properties of open reading frames in all currently available complete genome sequences are analyzed (seventeen prokatyotic genomes, and 16 chromosome sequences from the yeast genome). The size distribution of open reading frames is characterized by various techniques, such as quantile tables, QQ-plots, rank-size plots (Zipf's plots), and spatial densities. The issue of the influence of CG% on the size distribution is addressed. When yeast chromosomes are compared with archaeal and eubacterial genomes, they tend to have more long open reading frames. There is little or no evidence to reject the null hypothesis that open reading frames on six different reading frames and two strands distribute similarly. A topic of current interest, the base composition asymmetry in open reading frames between the two strands, is studied using regression analysis. The base composition asymmetry at three codon positions is analyzed separately. It was shown in these genome sequences that the first codon position is G- and A-rich (i.e. purine-rich); there is a co-existence of A- and T-rich branches at the second codon position; and the third codon position is weakly T-rich.
منابع مشابه
Genomic Sequences of Five Helicoverpa armigera Nucleopolyhedrovirus Genotypes from Spain That Differ in Their Insecticidal Properties
Helicoverpa armigera nucleopolyhedrovirus (HearNPV) has proved effective as the basis for various biological insecticides. Complete genome sequences of five Spanish HearNPV genotypes differed principally in the homologous regions (hrs) and the baculovirus repeat open reading frame (bro) genes, suggesting that they may be involved in the phenotypic differences observed among genotypes.
متن کاملComplete Genome Sequence of Human Coronavirus NL63 CN0601/14, First Isolated in South Korea
We report here the complete genome sequence of the human coronavirus NL63 CN0601/14 strain, first isolated from South Korea. It contains 18-nucleotide discontinuous deletions of the open reading frame 1a (ORF1a) and spike regions. This study will aid in our understanding of the complete genome sequences of isolated coronaviruses in South Korea.
متن کاملComplete Genome Sequences of Two Bovine Viral Diarrhea Viruses Isolated from Brain Tissues of Nonambulatory (Downer) Cattle
Here, we report the complete genome sequences of two bovine viral diarrhea viruses (BVDVs) (strains 11F011 and 12F004) isolated from brain tissues from nonambulatory (downer) cattle. The complete genomes of strains 11F011 and 12F004 contain 12,287 nucleotides (nt) with a single large open reading frame and 12,301 nt with a single large open reading frame, respectively. Phylogenetic analysis ind...
متن کاملComplete genome sequences of novel canine noroviruses in Hong Kong.
We report the complete genome sequences of two novel isolates of norovirus isolated from the fecal swab specimens of dogs in Hong Kong. The complete viral genome is approximately 7.6 kb in length and consists of 3 overlapping open reading frames encoding the ORF1 polyprotein, VP1, and VP2, respectively. Analysis of the VP1 sequence suggested that these noroviruses are divergent from known norov...
متن کاملWhole-Genome Sequences of Two Beak and Feather Disease Viruses in the Endangered Swift Parrot (Lathamus discolor)
Two complete genomes of beak and feather disease virus (BFDV) were characterized from Lathamus discolor, the Australian swift parrot. This is the first report of BFDV complete genome sequences in this host. The completed BFDV genomes consist of 1,984 nucleotides encoding two open reading frames with 99.7% pairwise nucleotide identity.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computers & chemistry
دوره 23 3-4 شماره
صفحات -
تاریخ انتشار 1999